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ESTIMATION  ISSUES  IN  BOND  RATING  MODELS 
ABSTRACT 

The  purpose  of  this  study  is  to  examine  the  bond  rating  model  estimation 
process  generally,  and  more  specifically,  to  consider  some  of  the  issues 
relating  to  hypothesis  tests  using  alternative  rating  models.   Evaluation 
measures  are  presented  and  discussed  which  can  be  used  to  compare  alternative 
models  or  alternative  estimation  methods.   A  sample  of  newly  issued  public 
utility  bonds  is  used  to  compare  the  results  of  different  statistical  estimation 
methods  (ordinary  least  squares,  ordered  probit  and  conjoint  analysis), 
alternative  dependent  variable  measurements  (evaluating  the  use  of  plus  and 
minus  modifiers)  and  alternative  sets  of  independent  variables  (addition  of 
qualitative  variables  representing  regulatory  environment  and  nuclear 
involvement) . 

Results  indicate  that  a  measure  of  variance  between  predicted  and  actual 
ratings  is  the  most  sensitive  measure  for  evaluating  model  fit.   No  systematic 
differences  are  found  between  ordinary  least  squares,  ordered  probit  and 
conjoint  analysis  estimation  methods.   The  use  of  modifiers  (Standard  &  Poor's 
plus  and  minus  categories)  in  classification  models  does  not  improve  the  ability 
of  the  model  to  classify  bonds  back  into  broad  (unmodified)  rating  categories. 
The  addition  of  qualitative  information  on  regulatory  environment  and  nuclear 
involvement  improves  the  explanatory  power  of  the  bond  rating  prediction  model 
for  this  sample.   Issues  involved  in  using  bond  rating  prediction  models  for 
purposes  of  evaluating  the  significance  of  potentially  useful  information,  such 
as  the  qualitative  factors  in  this  study,  are  discussed. 


ESTIMATION  ISSUES  IN  BOND  RATING  MODELS 
I .   INTRODUCTION 

The  topic  of  bond  racing  prediction  has  received  a  great  deal  of  attention 
in  the  finance  and  accounting  literature  [see  Kaplan  and  Urwitz,  1979  and 
Belkaoui,  1983  for  summaries  of  bond  rating  prediction  studies].   Bond  ratings 
are  important  because  they  are  viewed  as  a  measure  of  default  and  marketability 
risk  [West,  1970;  Pinches  and  Mingo,  1973].   The  literature  concerning  bond 
ratings  is  extensive,  but,  for  the  most  part,  each  study  represents  a  single 
sample  and  a  specific  approach  to  modeling  the  bond  rating  process.   There  has 
been  little  explicit  discussion  of  the  alternative  approaches  that  can  be  used 
to  model  the  bond  rating  process  or  the  issues  involved  in  comparing  performance 
of  different  models. 

The  purpose  of  the  present  study  is  to  examine  the  bond  rating  model  esti- 
mation process  generally  and,  more  specifically,  to  consider  some  of  the  issues 
relating  to  hypothesis  tests  using  alternative  bond  rating  models.  The  study  is 
concerned  primarily  with  methodological  issues.   A  framework  for  evaluating 
alternative  bond  rating  models  and/or  alternative  estimation  methods  is 
developed.   Various  measures  of  fit  that  can  be  used  for  evaluating  a  model 
and/or  estimation  method's  performance  are  discussed.   Two  statistics,  Kendall's 
tau  and  the  variance  between  predicted  and  actual  ratings,  are  suggested  as 
useful  measures  of  fit  in  addition  to  measures  used  in  previous  studies.   The 
use  of  these  statistics,  along  with  measures  employed  in  previous  studies,  is 
then  illustrated  by  evaluating  alternative  models  and  estimation  methods  applied 
to  a  sample  of  newly  issued  public  utility  bonds. 

The  evaluation  measures  presented  and  discussed  here  can  be  used  to  compare 
either  models  or  estimation  methods.   Three  applications  are  presented:  (1) 


comparison  of  statistical  estimation  methods,  (2)  comparison  of  alternative 
dependent  variable  measurements  and  (3)  comparison  of  alternative  sets  of 
independent  variables.  Two  of  the  statistical  estimation  procedures  have  been 
used  in  previous  studies:   ordinary  least  squares  regression  and  ordered  probit. 
The  third  procedure,  conjoint  analysis,  has  not  been  used  in  previous  studies. 
In  addition  to  representing  the  dependent  variable  in  the  usual  way  for  Standard 
and  Poor's  ratings,  a  second  measure  of  the  dependent  variable  including  the  use 
of  plus  and  minus  modifiers  is  evaluated.   Finally,  the  improvement  in  model  fit 
from  addition  of  qualitative  variables  representing  regulatory  environment  and 
nuclear  involvement  factors  is  analyzed. 

The  next  section  of  the  paper  reviews  bond  rating  prediction  and  discusses 
hypothesis  testing  using  bond  rating  models.  Generally,  the  comparison  of 
alternative  bond  rating  models  involves  the  test  of  a  hypothesis  that  additional 
variables  improve  the  fit  between  a  model  and  a  sample.  Bond  ratings  models  are 
increasingly  being  used  to  test  the  usefulness  of  various  types  of  information. 
There  has  been  little  explicit  discussion  of  the  use  of  bond  rating  models  in 
hypothesis  testing  and  providing  a  comprehensive  analysis  of  this  situation  is 
one  of  the  primary  contributions  of  this  paper.  Section  III  explores  the  process 
of  evaluating  and  comparing  bond  rating  models  and  estimation  methods.   Section 
IV  describes  the  sample  of  newly  issued  bonds  that  is  used  to  illustrate  the 
measures  of  comparison  suggested  in  Section  III.   Section  V  presents  the  results 
of  the  various  estimation  methods  and  tests  of  comparison.   The  last  section 
provides  a  brief  summary  of  the  paper  and  several  conclusions. 

II.   BOND  RATING  PREDICTION  AND  HYPOTHESIS  TESTING 

Numerous  studies  examine  the  predictability  of  bond  ratings  [Kaplan  and 
Urwitz,  1979  and  Belkaoui,  1983].   The  general  conclusion  is  that  60-702  of  a 


sample  of  bond  ratings  can  be  correctly  classified  by  a  simple  linear  model. 
For  the  most  part,  the  models  used  in  past  studies  have  been  chosen  on  the  basis 
of  statements  by  the  rating  agencies  about  factors  considered  in  rating  a  bond 
(e.g.,  Standard  &  Poor's  Rating  Guide)  or  other  empirical  techniques  such  as 
factor  analysis  [Melicher,  1974]  and  stepwise  regression  [Horrigan,  1966] . 
Results  appear  to  be  quite  robust  with  respect  to  the  particular  financial 
ratios  used  in  the  models,  probably  because  of  the  high  degree  of 
interrelatedness  of  financial  ratios.   It  is  interesting  to  point  out,  however, 
that  a  67X  prediction  accuracy  is  not  necessarily  large  enough  to  be  useful 
[Altman,  et  al . ,  1981].   In  fact,  the  rating  agencies  claim  that  substantial 
importance  is  attached  to  qualitative  factors  which  have  not  yet  been 
successfully  modeled. 

The  emphasis  on  additional  factors  which  might  be  significant  to  a  bond's 
rating  has  led  to  hypothesis  tests  concerning  the  significance  of  potentially 
useful  additional  information.   Previous  studies  which  have  evaluated  the  bene- 
fit from  including  additional  information  in  a  bond  rating  model  include  Baran, 
Lakonishok,  and  Ofer  [1980],  Iskandar  [1986],  Martin  and  Henderson  [1983],  Maher 
[1987]  and  Reiter  [1987].   Baran  et  al.  [1980]  compare  the  classification  accur- 
acy of  bond  rating  models  using  financial  variables  estimated  via  historical 
cost,  general  price  level,  and  a  combination  of  the  two  methods.   The  results 
show  the  greatest  classification  accuracy  for  the  model  that  includes  both  price 
level  and  historical  cost  variables.   Iskandar  [1986]  finds  that  the  addition  of 
bond  covenant  variables  to  the  standard  bond  rating  model  increases  classifica- 
tion accuracy.   Martin  and  Henderson  [1983]  examine  the  differential  predictive 
ability  of  models  that  include  traditional  versus  pension-adjusted  financial 
ratios  and  show  that  pension-adjusted  financial  ratios  significantly  improve  the 


prediction  accuracy  of  the  bond  rating  model  when  subordination  status  is  not 
included  in  the  model.   However,  when  subordination  status  is  included  in  the 
model,  prediction  accuracy  Is  not  significantly  better  for  the  model  using  pen- 
sion-adjusted ratios,  although  the  pension-adjusted  ratios  are  helpful  in  pre- 
dicting the  ratings  of  the  lower  rated  bonds.   Maher  [1987]  adds  measures  of 
pension  liability  to  a  bond  rating  model  and  evaluates  the  significance  of  the 
coefficients  of  the  pension  variables.   Reiter  [1987]  adds  three  measures  of 
pension  liability  to  a  bond  rating  model  and  finds  a  significant  increase  in  the 
explanatory  power  of  the  model  using  a  general  linear  test  [Neter  and  Wasserman, 
1974] . 

VJhile  the  studies  cited  above  indicate  that  bond  rating  models  can  be  used 
to  verify  the  value  of  some  additional  information,  hypothesis  tests  based  on 
bond  rating  models  may  be  more  complex  than  they  appear  to  be.   Consider  the 
process  of  predicting  a  bond's  rating.   The  first  step  is  to  identify  whatever 
information  may  affect  the  rating.   Relevant  information  may  be  identified  by  a 
theory  or  preconceived  framework.   Alternatively,  "brut  force"  empiricism  can  be 
used  to  identify  correlates  of  particular  bond  ratings. 

The  second  step  in  predicting  a  bond's  rating  is  to  determine  how  the 
relevant  information  is  combined  to  produce  a  rating.   It  is  this  second  step 
which  has  received  little  attention  in  the  literature.   It  is  well  known  that 
the  linear  model  is  the  most  robust  model  for  approximating  the  decision-making 
process.   In  fact,  the  linear  model  approximates  the  relationship  among  vari- 
ables so  well  that,  in  some  cases,  a  linear  model  approximates  observations  from 
a  non- linear  process  better  than  the  non- linear  model  that  actually  created  the 
data  [Emery,  Barron,  and  Messier,  1982].   However,  just  because  a  linear  model 
approximates  bond  ratings  better  than  other  models  does  not  mean  that  the  infor- 


mation  relevant  to  a  bond's  rating  is  actually  combined  in  a  linear  way  to  pro- 
duce the  rating.   It  is  very  likely  that  a  bond  rating  is  produced  by  a  complex 
non- linear  model  that  depends  upon  many  subtle  pieces  of  information  in  addition 
to  the  more  visible  ones.   What  this  means  to  the  process  of  predicting  a  bond's 
rating  is  that  virtually  all  hypothesis  tests  are  actually  joint  hypothesis 
tests  of  both  the  relevance  of  the  information  and  how  that  information  is  com- 
bined.  This  leaves  open  the  possibility  that  particular  information  is  relevant 
but,  because  it  is  not  incorporated  into  the  model  in  a  linear  way,  the  partic- 
ular information  does  not  appear  to  be  relevant  to  the  bond  rating  process: 
only  a  slight  improvement  in  the  model's  predictions  are  obtained  by  including 
the  additional  information. 

Given  the  existing  evidence  concerning  bond  ratings,  it  seems  unlikely  that 
additional  pieces  of  information  will  be  discovered  that  will  cause  a  marked 
increase  in  the  classification  accuracy  of  a  model,  unless  the  basic  model  is 
seriously  misspecified  to  start  with.   But  the  determinants  of  bond  ratings  are 
of  intrinsic  interest  even  if  a  model's  classification  accuracy  is  not  greatly 
increased  by  their  inclusion  in  the  model.   Therefore,  it  is  important  that 
hypothesis  tests  based  on  bond  rating  models  recognize  the  implicit  simultaneous 
hypothesis  that  information  is  linearly  combined  to  produce  a  bond  rating. 

Note  that,  in  contrast  with  the  search  for  additional  determinants  of  a 
bond's  rating,  if  the  only  purpose  of  the  research  is  to  predict  bond  ratings, 
the  relevance  of  any  particular  information  is  determined  solely  by  whether  it 
improves  a  model's  ability  to  predict  ratings.   In  such  cases,  the  simultaneous 
hypothesis  of  how  the  information  is  combined  is  not  relevant.   Of  course,  tech- 
niques developed  for  one  purpose  may  turn  out  to  be  useful  for  some  other  pur- 
pose also. 


Another  important  dimension  of  hypothesis  tests  concerning  bond  ratings  is 
the  consideration  of  the  source  of  the  hypothesis.   As  previously  pointed  out, 
there  are  two  methods  of  hypothesizing  that  particular  information  is  relevant 
to  a  bond's  rating.   In  the  first  case,  a  theory  or  preconceived  framework  leads 
to  a  hypothesis  that  particular  information  is  relevant  to  the  bond  rating  pro- 
cess.  In  order  to  test  this  hypothesis,  a  sample  is  gathered  and  two  models  are 
used  to  approximate  the  bond  ratings  in  the  sample.   One  model  includes,  and  the 
other  model  excludes,  the  particular  information  on  which  the  hypothesis  test  is 
based.   The  relevance  of  the  particular  information  is  established  by  showing 
that  the  model  which  includes  the  additional  information  better  fits  the  sample 
of  bond  ratings. 

In  the  second  case,  empiricism  is  used  by  gathering  an  initial  sample  and 
examining  it  for  whatever  particular  information  appears  to  be  empirically  rele- 
vant to  predicting  the  bond  ratings  in  the  sample.   In  other  words,  an  initial 
sample  is  used  to  develop  hypotheses  about  relevant  information.   Therefore,  in 
this  second  case,  the  hypotheses  developed  on  the  basis  of  an  initial  sample 
must  be  tested  on  a  second  ssimple.   This  is  because  even  though  there  may  be  a 
good  fit  between  the  model  and  the  initial  sample,  the  fit  may  be  spurious.   The 
second  sample  is  used  to  test  the  hypothesis  that  the  particular  information 
Identified  by  "force  fitting"  the  first  sample  is,  in  fact,  relevant  to  the  bond 
rating  process  and  not  simply  an  irrelevant  "post  hoc"  explanation  which  happens 
to  fit  the  data. 

The  distinction  between  whether  the  hypothesis  is  theoretically  or  empir- 
ically derived  bears  on  the  reasons  for  using  a  holdout  sample.   In  the  first 
case,  a  test  can  be  made  on  the  initial  sample.   Although  a  holdout  sample  may 
be  useful  to  eliminate  possible  fit  bias,  the  holdout  sample  is  a  second  test. 


With  empirically  derived  hypotheses,  the  holdout  sample  is  a  necessity  since 
there  can  be  no  test  without  a  second  sample.   Therefore,  the  appropriateness  of 
using  a  holdout  sample  depends  upon  the  basis  for  the  hypothesis. 

The  type  of  holdout  sample  that  is  appropriate  depends  upon  the  purpose  of 
the  research.   If  predicting  future  bond  ratings  is  part  of  the  research  objec- 
tive, the  prediction  model  can  depend  only  upon  information  that  would  be  avail- 
able to  the  dec is ion -maker  at  the  time  of  the  decision.   Therefore,  cross-sec- 
tional holdout  samples  are  inappropriate  for  testing  model  predictions  of  future 
ratings  since  some  of  the  bond  ratings  being  "predicted"  actually  occurred  prior 
to  other  bond  ratings  upon  which  the  model's  parameter  estimates  are  based.   In 
other  words ,  the  model  would  be  based  on  information  which  is  unavailable  at  the 
time  of  the  prediction.   In  order  to  the  test  the  model's  ability  to  predict 
future  bond  ratings,  the  parameters  in  the  model  must  be  estimated  on  a  sample 
drawn  from  one  time  period  and  model  predictions  are  tested  on  a  Scimple  drawn 
from  a  subsequent  time  period. 

III.   ESTIMATION  ISSUES 
III.l  Measures  of  Model  Fit 

Past  studies  have  used  model  coefficients,  goodness-of -f it  measures,  and 
classification  accuracy  to  test  the  joint  hypotheses  that  relevant  information 
is  combined  in  a  linear  model  to  produce  a  bond  rating.   However,  the  meaning  of 
particular  coefficient  values  cannot  be  attributed  solely  to  information  effects 
because  of  the  simultaneous  hypothesis  of  linearity.   When  additional  informa- 
tion is  added  to  the  model,  the  contribution  of  each  factor  cannot  be  directly 
compared  between  models  including  and  excluding  the  additional  information. 
Likewise,  the  usefulness  of  goodness -of- fit  measures  is  limited  because  there 
are  not  consistent  goodness  of  fit  measures  between  different  estimation  methods 


(ie.  probit  estimated  R- squared  is  not  comparable  to  OLS  R- squared  [Kaplan  and 
Urwitz,  1979])  and  goodness  of  fit  measures  cannot  be  compared  when  different 
dependent  variables  are  used.  Therefore,  goodness  of  fit  measures  will  only  be 
useful  when  comparing  different  sets  of  independent  variables. 

Classification  accuracy  does  not  suffer  from  the  problems  outlined  above. 
As  with  predicting  future  bond  ratings,  a  model  classifies  each  rating  correctly 
or  incorrectly  and  the  percentage  of  correctly  classified  ratings  is  a  measure 
that  does  not  depend  upon  how  many  independent  variables  the  model  includes, 
the  definition  of  the  dependent  variable  or  the  statistical  estimation  method. 
However,  classification  accuracy  can  be  criticized  as  being  a  very  "coarse" 
measure  of  model  fit  since  the  measure  does  not  indicate  how  close  the  incor- 
rectly classified  ratings  are  from  their  actual  ratings. 

Studies  using  classification  accuracy  have  not  always  been  clear  about  how 
the  results  of  alternative  models  should  be  compared.   In  several  studies  (e.g., 
Baran  et  al .  [1980]),  the  authors  simply  present  the  percentage  of  correct  clas- 
sification from  the  various  models  with  no  statistical  test  of  the  differences. 
Other  studies  (e.g.,  Elam  [1975]  and  Martin  and  Henderson  [1983]  use  a  chi- 
square  test  to  evaluate  differences  in  classification  accuracy.   However,  only  a 
very  large  difference  in  classification  accuracy  is  statistically  significant 
using  this  approach.   Because  of  the  concomitant  hypothesis  of  linearity  and  the 
extensive  existing  evidence  identifying  the  major  linear  correlates  of  a  bond's 
rating,  it  not  clear  that  a  substantial  increase  in  classification  accuracy 
should  be  expected  from  the  inclusion  of  additional  information. 

Standard  measures  used  in  previous  studies  provide  only  a  relatively  weak 
test  of  the  value  of  additional  information.   Consequently,  some  statistically 


more  powerful  measures  of  model  fit  are  suggested  here  which  might  be  used  as  a 
basis  for  hypothesis  tests  of  competing  models  and/or  methods  of  estimation. 

One  alternative  measure  of  fit  between  a  model  and  sample  data  is  Kendall's 
tau.   Kendall's  tau  measures  the  association  between  two  rank  orders  of  the  same 
items  to  determine  the  similarity  of  the  two  rank  orders.   Tau  takes  on  values 
between  plus  and  minus  one.   Tau  equals  one  if  the  rank  orders  are  identical  and 
minus  one  if  they  are  exact  inverses  of  each  other.   Tau  is  superior  to  the 
percentage  of  correct  predictions  since  it  includes  not  only  whether  the 
predicted  equals  the  actual  rating  but  also  if  they  are  not  equal,  the  value  of 
tau  depends  upon  how  far  the  predicted  and  actual  ratings  are  from  each  other.  A 
second  alternative  measure  of  model  fit  is  the  variance  between  the  predicted 
and  actual  ratings.   The  variance  equals  zero  whenever  the  predicted  and  actual 
ratings  are  identical  throughout  the  sample.   As  with  Kendall's  tau,  the 
variance  depends  upon  how  far  the  incorrectly  predicted  and  actual  ratings  are 
from  each  other.  ■'■ 
III. 2   Estimation  Methods 

Previous  bond  rating  studies  have  used  multiple  discriminant  analysis 
(MDA) ,  ordinary  least  squares  regression  (OLS) ,  and  ordered  probit  as  model 
estimation  techniques  (see  Kaplan  and  Urwitz  [1979],  Altman  et  al .  [1981],  and 
Ederington  [1985]  for  detailed  discussions  of  these  methods).   OLS  estimation  in 
bond  rating  studies  is  criticized  on  the  grounds  that  it  requires  the  assumption 
that  the  variables  are  measured  on  an  interval  scale  [Kaplan  and  Urwitz,  1979]. 
In  addition  to  the  problems  created  by  the  use  of  discrete  rather  than  a  contin- 
uous dependent  variable,  one  has  little  prior  basis  for  assuming  that  there  is 
an  equal  interval  between  the  rating  pairs  AAA  and  AA,  AA  and  A,  and  so  forth. 
Kaplan  and  Urwitz  [1979]  suggest  the  use  of  a  maximum  likelihood  estimation 


technique  called  ordered  probit  which  is  appropriate  for  estimating  models  with 
an  ordinally  measured  dependent  variable.   Probit  assumes  that  there  is  an  un- 
derlying dependent  variable  of  theoretical  interest  which  is  interval  in  nature 
but  the  variable  of  interest  is  represented  by  an  observed  ordinally  measured 
dependent  variable  [McKelvey  and  Zavoina,  1975].   This  fits  well  with  the  gen- 
eral conceptualization  of  the  bond  rating  process  as  a  quality  rating  continuum 
broken  into  discrete  ratings  by  a  set  of  cutoff  points.   Even  though  ordered 
probit  appears  to  be  most  consistent  with  the  nature  of  the  bond  rating  problem, 
Ederington  [1985]  points  out  that  the  comparison  of  estimation  methods  on  actual 
data  is  important. 

Kaplan  and  Urwitz  [1979]  find  little  empirical  difference  between  the  OLS 
and  probit  estimation  methods.   Similarly,  Noreen  [1987]  finds  that  the  OLS 
estimation  method  performs  at  least  as  well  as  dichotomous  probit  with  sample 
sizes  and  input  data  typical  of  accounting  applications.   In  contrast, 
Ederington  [1985]  finds  that  classification  accuracy  using  ordered  probit  is 
statistically  superior  to  OLS.   In  light  of  these  conflicting  findings,  further 
comparison  of  the  methods  is  warranted. 

Another  method  that  is  appropriate  for  estimating  models  with  ordinally 
measured  variables  is  conjoint  analysis  [Green,  1975] .   Conjoint  analysis  has 
been  applied  extensively  in  the  area  of  marketing  research  but  has  not  been  used 
in  previous  bond  rating  studies.   In  this  study,  all  three  methods,  OLS,  ordered 
probit,  and  conjoint  analysis,  are  used  to  estimate  bond  rating  models. 
III. 3   Dependent  Variable  Measurement 

The  dependent  variable  has  typically  been  measured  as  simply  the  bond's 
rating  category,  such  as  B,  BB,  etc.,  represented  by  the  integers  1,  2,  .  .  .  , 
and  so  forth.   However,  both  Moody  and  Standard  and  Poor  use  modifiers,  such  as 
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plus  and  minus,  within  rating  categories.   While  Standard  and  Poor  has  been 
using  modifiers  since  1975,  Moody  has  used  modifiers  only  since  April  of  1982. 
Perry  [1985]  finds  that  the  use  of  modifiers  substantially  decreases  classifica- 
tion accuracy  when  the  model  attempts  to  classify  the  bonds  back  into  their 
exact  sub-rating  categories.   Although  it  appears  that  classifying  back  into  the 
exact  sub-rating  category  is  not  feasible,  it  is  possible  that  the  use  of  the 
sub-rating  categories  as  inputs  to  the  model  may  increase  the  ability  of  the 
model  to  classify  bonds  back  into  the  broad  rating  categories.   Consequently,  in 
this  study,  the  use  of  sub-  and  broad  rating  categories  is  compared  with  the  use 
of  broad  rating  categories  alone  to  represent  the  dependent  variable  in  the 
model  estimation  process. 

IV.   APPLICATION 

A  sample  was  created  by  identifying  all  newly  issued  public  utility  bonds 
from  March  1981  through  February  1984  that  were  rated  by  both  Standard  and 
Poor's  and  Moody's.   Bonds  were  included  in  the  sample  if  the  issuers  were 
classified  as  public  utilities  by  Moody's  Public  Utility  Manual.   After 
eliminating  observations  where  complete  information  was  unavailable  and  dropping 
from  the  sample  one  convertible  bond,  one  discount  bond  and  four  very  small 
issues  offered  on  a  "best  efforts  basis,"  the  final  sample  consists  of  281  bond 
issues.   The  distributions  of  Standard  and  Poor's  ratings  and  Moody's  ratings 
for  the  sample  are  presented  in  Table  1. 

Table  2  presents  the  variables  used  in  the  bond  rating  model  and  their 
expected  coefficient  signs.   The  bond  rating  model  includes  an  issue's  number  of 
years  to  maturity  and  other  issue  characteristics  as  well  as  several  variables 
that  are  specific  to  the  environment  of  public  utilities  in  the  early  1980' s. 
NUKEl  and  NUKE2  are  variables  that  relate  to  nuclear  plants  and  REGl  and  REG2 
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are  variables  that  relate  to  state  regulatory  commissions.   Sources  for  the 
development  of  financial  variables  include  Standard  and  Poor's  RatinE  Guide. 
Melicher's  [1974]  factor  analysis  of  utility  ratios,  and  Altman  and  Katz ' s 
[1976]  study  of  public  utility  bond  ratings.   The  financial  variables  chosen  for 
use  in  this  study  cover  the  information  found  to  be  most  important  in  previous 
studies:   cash  flow  adequacy,  asset  protection,  capitalization,  firm  size,  and 
earnings  stability.   The  financial  variables  are  adjusted  for  industry  differen- 
ces by  representing  an  individual  firm's  ratio  by  its  position  relative  to  the 
industry  group  median.   Three  industry  groups  are  represented  in  the  sample 
(electric,  natural  gas,  and  telephone).   Industry  medians  are  calculated  from 
the  utilities  in  Standard  and  Poor's  40  Utilities  Index.   Descriptive  statistics 
on  the  independent  variables  are  also  presented  in  Table  2. 

The  application  of  conjoint  analysis  in  this  study  is  similar  to  the  binary 
model  in  Edmister's  [1972]  study  of  small  business  failures.   Each  financial 
variable  is  transformed  into  two  binary  variables  based  on  the  quartiles  of  the 
industry  ratio  distributions.   For  example,  a  binary  variable  RDEl  is  coded  one 
if  the  value  of  RDE  falls  in  the  largest  quartile  of  the  distribution  of 
industry  values.   Likewise,  EIDE2  is  coded  one  if  the  value  of  EIDE  falls  in  the 
smallest  quartile  of  the  distribution  of  industry  values.   When  both  RDEl  and 
RDE2  are  zero  the  value  of  RDE  falls  in  the  two  quartiles  surrounding  the  median 
ratio  value  of  the  industry. 
IV. 1   Comparison  of  Estimation  Methods 

A  comparison  of  the  OLS ,  probit,  and  conjoint  analysis  estimation  methods 
is  presented  in  Table  3.   When  classification  accuracy  is  computed  for  the 
estimation  sample,  a  probable  fit  bias  is  present  since  the  model  was  estimated 
on  those  same  observations  [Belkaoui,  1983].   In  this  comparison,  two  different 

12 


holdout  samples  are  used.   The  subsequent  holdout  sample  is  composed  of  all  the 
bonds  issued  in  the  last  year  of  the  sample  time  period  and  has  69  of  the  281 
observations  in  it.   The  concurrent  holdout  sample  was  chosen  randomly  from  the 
complete  sample  of  281  observations  and  has  91  observations.  Observations  not  in 
the  holdout  sample  make  up  the  estimation  sample.   Models  are  estimated  on  the 
estimation  sample  and  test  statistics  are  computed  from  the  holdout  sample. 
Distribution  of  bond  ratings  for  the  concurrent  and  subsequent  holdout  samples 
is  presented  in  Table  1. 

Table  3  presents  the  test  statistics  from  using  the  three  statistical 
estimation  methods  on  the  entire  sample  and  the  two  holdout  samples  using  both 
Standard  and  Poor's  and  Moody's  ratings  as  the  dependent  variable.   The  three 
measures  of  fit,  percentage  of  correct  classification,  Kendall's  tau  and 
variance  of  predicted  versus  actual,  coordinate  in  that  higher  classification 
accuracy  is  generally  associated  with  a  higher  Kendall's  tau  and  a  lower 
variance.   None  of  the  percentages  of  classification  accuracy  are  significantly 
different  from  each  other  by  either  chi- square  tests  or  binomial  tests  of 
proportion.  No  one  statistical  method  appears  to  produce  a  consistently  higher 
percentage  of  classification  accuracy,  rank  ordering  of  the  methods  would  change 
with  sample  and  dependent  variable  definitions.   Test  statistics  for  the  holdout 
samples  do  not  differ  substantially  from  those  obtained  by  estimating  the  models 
on  the  entire  sample.   Since  no  significant  estimation  bias  is  apparent  from 
this  comparison,  we  do  not  use  holdout  samples  in  subsequent  comparisons. 

Conjoint  analysis  does  not  appear  to  perform  better  than  ordinary  least 
squares  or  probit  so  that  there  is  little  justification  for  recommending  further 
applications  in  bond  rating  prediction.   Probit  estimation  is  not  demonstrably 
superior  to  ordinary  least  squares  estimation.   This  result  is  similar  to  Kaplan 

13 


and  Urwitz  [1979]  and  Howard,  Wilson  and  Elam  [1983]  but  counter  to  the 
superiority  Ederington  [1985]  finds  for  probit  estimation.   Although  ordinary 
least  squares  estimation  is  theoretically  inferior  to  ordered  probit,  in 
practice  it  appears  to  be  a  viable  alternative.   A  number  of  diagnostics  of 
model  fit  are  available  for  OLS  estimation  which  may  be  useful  in  a  particular 
application. 

The  enhanced  power  of  the  variance  between  actual  and  predicted  as  a 
measure  of  comparison  of  models  is  illustrated  in  the  comparison  of  ordered 
probit  and  conjoint  analysis  for  the  concurrent  holdout  sample  using  Standard  & 
Poor's  ratings.   The  difference  in  classification  accuracy  is  50. 5Z  versus  61.5% 
-  but  even  this  large  difference  is  not  statistically  significant.   Neither  is 
the  difference  in  Kendall's  tau  of  .623  versus  .729.   The  ratio  of  the  variances 
indicates,  however,  that  they  are  significantly  different  using  an  F  test  for 
the  ratios  of  variances.   No  systematic  pattern  of  differences  in  found  in  Table 
3  which  would  justify  a  conclusion  of  superiority  for  any  particular  statistical 
method,  but  one  can  see  that  the  variance  is  a  more  sensitive  indicator  of  model 
differences  that  percentage  of  correct  classification. 
IV. 2   Comparison  of  Dependent  Variable  Measurement  Methods 

Statistics  for  estimating  the  bond's  broad  rating  category  using  a 
dependent  variable  which  includes  plus  and  minus  modifiers  are  given  in  Table  4. 
As  can  be  seen,   the  plus  and  minus  information  does  not  appear  to  enhance  model 
fit  at  all.   Perry  [1985]  finds  a  great  reduction  in  classification  accuracy 
when  modifiers  are  used  in  probit  models  and  an  attempt  is  made  to  classify  back 
into  the  modified  rating  categories.   It  appears  that  even  using  the  modifiers 
to  help  classify  back  into  broad  rating  categories  does  not  improve 
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classification.   Indeed,  for  the  probit  model,  the  classification  accuracy  is 
decreased  and  the  ratio  of  the  variances  is  1.33  to  1. 
IV. 3  Addition  of  Explanatory  Variables 

Table  5  presents  tests  of  incremental  information  content  relative  to 
adding  qualitative  variables  representing  regulatory  environment  and  nuclear 
involvement  to  the  bond  rating  prediction  model.   Classification  accuracy  is  not 
significantly  improved;  however,  it  seems  unlikely  that  addition  of  incremental 
information  to  any  but  a  seriously  misspecified  model  would  result  in  a 
statistically  significant  improvement  in  classification  accuracy.   The 
coefficients  of  REG2  and  NUKE2,  representing  a  difficult  regulatory  environment 
and  trouble  with  nuclear  plants  respectively,  are  significant  in  both  OLS  and 
probit  models.   As  mentioned  previously,  it  is  can  be  misleading  to  rely  on 
individual  variable  coefficients  as  tests  of  contribution  of  additional 
information.   The  general  linear  test  [Neter  and  Wassennan,  1974]  and  the 
ordered  probit  equivalent  maximum  likelihood  test  [McKelvey  and  Zavoina,  1975] 
test  the  significance  of  the  addition  of  all  four  independent  variables.   Both 
the  general  linear  test  and  the  likelihood  ratio  test  indicate  statistically 
significant  improvement  in  explanatory  power  from  addition  of  the  regulatory  and 
nuclear  involvement  variables. 

V.   SUMMARY  AND  CONCLUSIONS 

One  contribution  of  this  study  is  the  consideration  of  the  circumstances 
when  it  is  appropriate  to  use  bond  rating  models  for  hypothesis  testing  and  when 
it  is  mandatory  to  use  holdout  samples  for  validation.   Much  of  the  thinking 
underlying  the  use  of  bond  rating  prediction  models  derives  from  their  early 
development  as  empirically  derived  prediction  models.   More  recently,  bond 
rating  prediction  has  been  used  as  a  medium  for  comparison  and  evaluation  of 
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various  information.   There  has  been  little  explicit  discussion  of  the 
difference  in  perspective  involved  in  hypothesis  testing  versus  simple 
prediction.   Another  consideration  in  the  use  of  bond  rating  models  for 
hypothesis  testing  is  the  realization  that  all  hypothesis  tests  are  actually 
joint  tests  of  the  relevance  of  the  infoinnation  and  the  use  of  a  linear  bond 
rating  prediction  model. 

Several  more  sensitive  measures  for  evaluating  model  fit  are  suggested  and 
illustrated  in  conjunction  with  the  traditional  percentage  of  correct 
prediction.   The  variance  of  predicted  versus  actual  ratings  is  shown  to  be  most 
sensitive  to  differences  between  models.   In  addition,  no  systematic  significant 
differences  are  found  between  ordinary  least  squares,  ordered  probit  and 
conjoint  analysis  estimation  methods.   Conjoint  analysis  has  not  previously  been 
used  in  bond  rating  prediction.   The  use  of  modifiers  (Standard  &  Poor's  plus 
and  minus)  in  classification  models  does  not  appear  to  improve  the  ability  of 
the  model  to  classify  bonds  back  into  broad  rating  categories.   Finally,  the 
addition  of  qualitative  information  on  regulatory  environment  and  nuclear 
involvement  improves  the  explanatory  power  of  bond  rating  prediction  models  for 
this  utility  new  issue  sample. 


16 


The  formula  for  Kendall's  tau  is: 


K  -   (Nc  -  Nd)  /  V  (N  -  Nx)  (N  -  Ny) 

where:     Nc  -  number  of  concordant  pairs 

Nd  -  number  of  discordant  pairs 

N  -  number  of  possible  pairs 

Nx  -  number  of  pairs  tied  on  X 

Ny  -  number  of  pairs  tied  on  Y 

[SAS,  1982] 

The  variance  of  predicted  versus  actual  is  calculated  as  the  squared 
deviations  divided  by  N. 
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TABLE 

1 

DISTRIBUTION  OF 

BOND  RATINGS 

Standard  &  Poor 

's  Rating: 

Standard 

Plus 

Number 

Broad 

Number 

Percent 

&  Poor's 

and 

of 

Rating 

of 

in 

Rating 

Minus 

Firms 

Category  Firms 

Sample 

Coded 

Coded 

AAA 

19 

18 

6 

18 

6.4% 

AA+ 

18 

2 

AA 

17 

15 

5 

32 

11.4% 

AA- 

16 

15 

A+ 

15 

27 

A 

14 

35 

4 

94 

33.5% 

A- 

13 

32 

BBB+ 

12 

52 

BBB 

11 

45 

3 

127 

45.2% 

BBB- 

10 

30 

BB+ 

9 

4 

BB 

8 

2 

2 

8 

2.8% 

BB- 

7 

2 

B+ 

6 

2 

1 

2 

.7% 

Moody's  Rating: 
Number    Percent 
of        in 
Firms    Sample 


19 
31 

121 

102 


6.8% 
11.0% 

43.1% 

36.3% 

2.8% 


Concurrent  Holdout  N-91 


Rating 

Number 

of 

Number 

of 

Category 

Firms 

Percent 

Firms 

Percent 

Standard 

&  Poor's 

Moody ' s 

Rating 

AAA 

12 

13.2% 

13 

14.3% 

AA 

12 

13.2% 

14 

15.4% 

A 

20 

22.0% 

23 

25.3% 

BBB 

45 

49.5% 

40 

44.0% 

BB 

2 

2.2% 

1 

1.1% 

Subsequent  Holdout  N-69 


Rating 

Number 

of 

Number 

of 

Category 

Firms 

Percent 

Firms 

Percent 

Standard 

6c  Poor's 

Moody's  Rating 

AAA 

1 

1.4% 

1 

1.4% 

AA 

6 

8.7% 

6 

8.7% 

A 

20 

29.0% 

29 

42.0% 

BBB 

36 

52.2% 

30 

43.5% 

BB 

5 

7.2% 

3 

4.3% 

B 

1 

1.4% 
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TABLE  2 
BOND  RATING  PREDICTION  MODEL 


Variable   Expected  Description 
Sign 

Dependent  variable 
SR  Bond  rating  -  Standard 

SXR  Bond  rating  -  Standard 

MXR  Bond  rating  -  Moody's 

Nonfinancial  variables 


&  Poor's  -  plus  and  minus 
&  Poor's 


Years  to  maturity 
Sinking  fund 
First  mortgage 

Involvement  with  nuclear  plant 
Trouble  with  nuclear  plant 
Regulatory  cooperation  necessary 
Regulatory  cooperation  vital 
Index  of  consumer  sentiment 
Financial  variables  (all  industry  adjusted) 

Cash  flow  to  construction  expenditures 

Property  funding  ratio 

Debt-equity  ratio 

Permanent  capitalization 

Coefficient  of  variation  of  return  on  equity 

Pretax  interest  coverage 


MATYR 

- 

SF 

+ 

MTGE 

+ 

NUKEl 

- 

NUKE2 

- 

REGl 

- 

REG  2 

- 

MOOD 

+ 

F inane 

RCONST 

+ 

RPROP 

- 

RDE 

- 

RSIZE 

+ 

RROE 

- 

RCOV 

+ 

Descriptive  Statistics 

Variable  Mean   Standard  Minimum  Maximum  Number   Percent 

Deviation  Coded  1  in  Sample 


MATYR 

20, 

.5886 

10.3047 

5 

40 

SF 

125 

44, 

,33 

MTGE 

224 

79. 

,43 

NUKEl 

116 

41. 

.13 

NUKE  2 

61 

21. 

.63 

REGl 

147 

52. 

.13 

REG2 

8 

2. 

,84 

MOOD 

74, 

.6894 

10.1024 

62 

100.1 

CONST 

43, 

.1156 

35.9208 

-96 

151 

RCONST 

.7933 

.4804 

.0153 

3.0516 

PROP 

43, 

.6503 

7.0726 

19 

.8 

72.1 

RPROP 

.9777 

.1641 

.4573 

1.6312 

DE 

48 

.7011 

6.5897 

20 

67 

RDE 

1 

.0117 

.1367 

.4667 

1.5053 

SIZE 

3327 

.6362 

3006.96 

95 

.528 

16584 

RSIZE 

.8281 

.9466 

.0271 

5.9588 

ROE 

.1237 

.0756 

.0133 

.5699 

RROE 

1 

.3910 

.8953 

.1118 

5.5871 

GOV 

2 

.8176 

.8538 

1 

.45 

6.10 

RCOV 

.9854 

.2743 

.3796 

1.8560 
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TABLE  3 
COMPARISON  OF  ESTIMATION  METHODS 


ORDINARY 

LEAST 

SQUARES 


ORDERED 
PROBIT 


CONJOINT 
ANALYSIS 


ENTIRE  SAMPLE  -  STANDARD  &  POOR'S  RATINGS 


PERCENT  CORRECT 
KENDALL'S  TAU 
VARIANCE  - 
(Predicted  v.  Actual) 


66.192% 
.667  (.034) 
.40214 


61.210% 
.615  (.037) 
.46263 


66.548% 
.668  (.034) 
.39858 


ENTIRE  SAMPLE  -  MOODY'S  RATINGS 


PERCENT  CORRECT 
KENDALL'S  TAU 
VARIANCE  - 
(Predicted  v.  Actual) 


63.701% 
.599  (.039) 
.41627 


59.786% 
.593  (.037) 
.41281 


63.701% 
.608  (.038) 
.40569 


CONCURRENT  HOLDOUT  SAMPLE  -  STANDARD  &  POOR'S  RATINGS 


PERCENT  CORRECT 
KENDALL'S  TAU 
VARIANCE  - 
(Predicted  v.  Actual) 


59.341% 
.638  (.061) 
.53846 


61.538% 
.729  (.049) 
.41758 


50.549% 
.623  (.062) 
.62637 


CONCURRENT  HOLDOUT  SAMPLE  -  MOODY'S  RATINGS 


PERCENT  CORRECT 
KENDALL'S  TAU 
VARIANCE  - 
(Predicted  v.  Actual) 


53.846 
.625  (.057) 
.59341 


54.945 
.663  (.057) 
.54945 


54.945 
.630  (.056) 
.58242 


SUBSEQUENT  HOLDOUT  SAMPLE  -  STANDARD  &  POOR'S  RATINGS 


PERCENT  CORRECT 
KENDALL'S  TAU 
VARIANCE  - 
(Predicted  v.  Actual) 


66.667% 
.620  (.077) 
.37681 


62.319% 
.597  (.076) 
.46377 


63.768% 
.614  (.072) 
.44928 


SUBSEQUENT  HOLDOUT  SAMPLE  -  MOODY'S  RATINGS 


PERCENT  CORRECT 
KENDALL'S  TAU 
VARIANCE  - 
(Predicted  v.  Actual) 


53.846% 
.625  (.089) 
.42029 


50.725% 
.400  (.101) 
.62319 


56.522% 
.502  (.087) 
.39130 
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TABLE  4 

DEPENDENT  VARIABLE  MEASUREMENT 

USE  OF  MODIFIERS 

STANDARD  &  POOR'S 


BROAD  RATING 
CATEGORIES 


PLUS  AND 
MINUS 


ORDINARY  LEAST  SQUARES 


PERCENT  CORRECT 
KENDALL'S  TAU 
VARIANCE  - 
(Predicted  v.  Actual) 


66.192% 
.667  (.034) 
.40214 


65.836% 
.669  (.032) 
.39502 


ORDERED  PROBIT  (AA.  A  AND  BBB  ONLY) 


PERCENT  CORRECT 
KENDALL'S  TAU 
VARIANCE  - 
(Predicted  v.  Actual) 


65.217% 
.537  (.044) 
.40711 


58.893% 
.445  (.047) 
.54150 


CONJOINT  ANALYSIS 


PERCENT  CORRECT 
KENDALL'S  TAU 
VARIANCE  - 
(Predicted  v.  Actual) 


66.548% 
.668  (.034) 
.39858 


64.413% 
.645  (.036) 
.41993 
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TABLE  5 
TEST  OF  INCREMENTAL  INFORMATION  CONTENT 


WITHOUT  REGULATORY  AND 
NUCLEAR  PLANT  VARIABLES 


WITH  REGULATORY  AND 
NUCLEAR  PLANT  VARIABLES 


ORDINARY  LEAST  SQUARES  -  STANDARD  &  POOR'S  RATINGS 


ADJUSTED  R- SQUARED 
SUM  OF  SQUARED  ERRORS 
GENERAL  LINEAR  TEST 
PERCENT  CORRECT 
KENDALL'S  TAU 
VARIANCE  - 
(Predicted  v.  Actual) 


.5790 

104.59011  (270  d.f.) 

F  -  7.136  (4,  266  d.f.) 

66.548% 

.626  (.038) 

.46263 


.6141 

94.45515  (266  d.f.) 

66.192% 
.667  (.034) 
.40214 


ORDERED  PROBIT  -  STANDARD  &  POOR'S  RATINGS 


ESTIMATED  R- SQUARED 

-2  X  LOG  LIKELIHOOD  RATIO 

LIKELIHOOD  RATIO  TEST 

PERCENT  CORRECT 

KENDALL'S  TAU 

VARIANCE  - 

(Predicted  v.  Actual) 


.63022 

226.2738 

X  -  25.429  (4  d.f.) 

62.989% 

.617  (.038) 

.50890 


.67497 
251.7035 

61.210% 
.615  (.037) 
.46263 
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